On Statistical Significance of Signal 
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Abstract A definition for the statistical significance of a signal in an experiment is proposed by establishing 
a correlation between the observed p-value and the normal distribution integral probability, which is suitable 
for both counting experiment and continuous test statistics. The explicit expressions to calculate the statistical 
significance for both cases are given. 
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1 Introduction 

The statistical significance of a signal in an ex- 
periment of particle physics is to quantify the degree 
of confidence that the observation in the experiment 
either confirm or disprove a null hypothesis Hq, in 
favor of an alternative hypothesis H^. Usually Hq 
stands for the known or background processes, while 
the alternative hypothesis Hi stands for a new or a 
signal process plus background processes with respec- 
tive production cross section. This concept is very 
useful for usual measurements that one can have an 
intuitive estimation, to what extent one can believe 
the observed phenomena are due to backgrounds or 
a signal. It becomes crucial for the measurements 
which claim a new discovery or a new signal. As a 
convention in particle physics experiment, the "5(t" 
standard, namely the statistical significance S ^ 5 is 
required to define the sensitivity for discovery; while 
in the cases S ^ 3 {S ^ 2), one may claim that the 
observed signal has strong (weak) evidence. 

However, as pointed out in Ref. [1], the concept 
of the statistical significance has not been employed 
consistently in the most important discoveries made 
over the last quarter century. Also, the definitions of 
the statistical significance in different measurements 
differ from each other. Listed below are various defi- 
nitions for the statistical significance in counting ex- 
periment (see, for example, Refs. [2 — 4]): 



SB^2 — 251 



fc(a). 



5i = {n~b)/Vb, 
82 = {n- h)/s/n, 
S12 = Vn - Vb, 
Sbi = 5*1 — k{a)\/ n/b, 



(1) 

(2) 
(3) 
(4) 



iV(0,l)da; = ^e-''-, 



(5) 
(6) 



where n is the total number of the observed events, 
which is the Poisson variable with the expectation 
s + b, s is the expected number of signal events to be 
searched, while b is the known expected number of 
Poisson distributed background events. All numbers 
are counted in the "signal region" where the searched 
signal events are supposed to appear. In Eqs. (4) 
and (5), k{a) is a factor related to a that the cor- 
responding statistical significance assumes 1 — a ac- 
ceptance for positive decision about signal observa- 
tion, and fc(0.5) = 0, fc(0.25) = 0.66, fc(O.l) = 1.28, 
fc(0.05) = 1.64 etc.'^'. In Eq. (6), A^(0, 1) is a no- 
tation for the normal function with the expectation 
and variance equal to and 1, respectively. On the 
other hand, the measurements in particle physics of- 
ten examine statistical variables that are continuous 
in nature. Actually, to identify a sample of events 
enriched in the signal process, it is often important 
to take into account the entire distribution of a given 
variable for a set of events, rather than just to count 
the events within a given signal region of values. In 
this situation, I. Nasky'*' gives a definition of the sta- 
tistical significance via likelihood function 



Sl = ^-2\nL{b)/L{s + b) 



(7) 



under the assumption that —2ln L{b) / L{s + b) dis- 
tributes as function with degree of freedom of 1. 

Upon the above situation, it is clear that we de- 
sire to have a self-consistent definition for statisti- 
cal significance, which can avoid the danger that the 
same S value in different measurements may imply 
virtually different statistical significance, and can be 
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suitable for both counting experiment and continuous 
test statistics. In this letter we propose a definition of 
the statistical significance, which could be more close 
to the desired property stated above. 

2 Definition of the statistical signifi- 
cance 

The p-value is defined to quantify the level of 
agreement between the experimental data and a 
hypothesis'^' Assume an experiment makes a mea- 
surement for test statistic t being equal to t^bs, and 
t has a probability density function g{t\HQ) if a null 
hypothesis Hq is true. We futher assume that large 
t values correspond to poor agreement between the 
data and the null hypothesis Hq, then the j3- value of 
an experiment would be 



P(iobs) = P{t > tobs\Ho) = 



git\Ho)dt. (8) 



A very small p-value tends to reject the null hypoth- 
esis Hq. 

Since the p- value of an experiment provides a mea- 
sure of the consistency between the Hq hypothesis 
and the measurement, our definition for statistical 
significance S relates with the p- value in the form of 



N{0,l)dx = l-p(t,bs), 



(9) 



under the assumption that the null hypothesis Hq 

represents that the observed events can be described 
merely by background processes. Because a small p- 
value means a small probability of Hq being true, cor- 
responds to a large probability of Hi being true, one 
would get a large signal significance S for a small p- 
value, and vice versa. The left side of Eq. (9) repre- 
sents the probability of the normal distribution in the 
region within ±5* standard deviation {±Sa), there- 
fore, this definition conforms itself to the meaning of 
that the statistical significance should have. In such 
a definition, some correlated S and p- values are listed 
in Table 1. 



Table 1. The statistical significance S and cor- 
related p-value. 


S 


p- value 


1 


0.3173 


2 


0.0455 


3 


0.0027 


4 


6.3x10-5 


5 


5.7x10-^ 


6 


2.0x10-" 



3 Statistical significance in counting 
experiment 

A group of particle physics experiment involves 
the search for new phenomena or signal by observ- 
ing a unique class of events that can-not be described 
by background processes. One can address this prob- 
lem to that of a "counting experiment", where one 
identifies a class of events using well-defined crite- 
ria, counts up the number of observed events, and 
estimates the average rate of events contributed by 
various backgrounds in the signal region, where the 
signal events (if exist) will be clustered. Assume in 
an experiment, the number of signal events in the sig- 
nal region is a Poisson variable with the expectation 
s, while the number of events from backgrounds is a 
Poisson variable with a known expectation b without 
error, then the observed number of events distributes 
as the Poisson variable with the expectation s + b. 
If the experiment observed riobs events in the signal 
region, then the p-value is 



p(nobs) 



P{n> nobsl-ffo) 



^ n! n! ^ ^ 



Substituting this relation to Eq. (9), one immediately 
has 



"obs-l 



iV(0, l)dx= 



(11) 



Then, the signal statistical significance 5* can be eas- 
ily determined. Comparing this equation with Eq. 
(6) given by Ref. [4] , we notice the lower limit of the 
integral is different. 

4 Statistical significance in continuous 
test statistics 

The general problem in this situation can be ad- 
dressed as follows. Suppose we identify a class of 
events using well-defined criteria, which are charac- 
terized by a set of observations Ai, A2, • • • , Ajv for 
a random variable X. In addition, one has a hypoth- 
esis to test that predicts the probability density func- 
tion of A, say f{X\9), where 6 = {61,62,- ■■ ,6*^) is 
a set of parameters which need to be estimated from 
the data. Then the problem is to define a statistic 
that gives a measure of the consistency between the 
distribution of data and the distribution given by the 
hypothesis. 

To be concrete, we consider the random variable 
X is, say, an invariant mass, and the N observations 
Ai, X2, • • • , Xff give an experimental distribution of 
X. Assuming parameters = [6^,92, i^k) = 



{Os]Ob), where 0^ and Ob represent the parameters 
related to signal (say, a resonance) and backgrounds 
contribution, respectively. Wc assume the null hy- 
pothesis Hq stands for that the experimental distri- 
bution of X can be described merely by the back- 
ground processes, while the alternative hypothesis 
stands for that the experimental distribution of 
X should be described by the backgrounds plus sig- 
nal; namely, the null hypothesis Hq specifies the fixed 
value (s) for a subset of parameters Og (the number of 
fixed parameter(s) is denoted as r), while the alter- 
native hypothesis leaves the r parameter (s) free 
to take any value(s) other than those specified in H^. 
Therefore, the parameters 6 are restricted to lie in a 
subspace w of its total space Q. On the basis of a 
data sample of size TV from f{X\9), we want to test 
the hypothesis Hq : 6 belongs to iv. Given the ob- 
servations Xi,X2, - ■■ ,Xn, the likelihood function is 

N 

L = Yl f{Xi\d). The maximum of this function over 

i=l 

the total space i? is denoted by L(!7); while within the 

subspace uj the maximum of the likelihood function is 
denoted by L{lu), then we define the likelihood-ratio 
A = L{d!) / L{f2) . It can be shown that for Hq true, 
the statistic 

t= -21nA = 2(lnL^a.(s + 6)-lnL^,^(6)) (12) 

is distributed as x^('') when the sample size N is 

large'^'. In Eq. (12) we use L^^^{a + h) and L^^^{b) 
denoting L(]7) and L(uj), respectively. If A turns out 
to be in the neighborhood of 1, the null hypothesis 
Hq is such that it renders i(cD) close to the maximum 
L{f2), and hence Hq will have a large probability of 
being true. On the other hand, a small value of A will 
indicates that Hq is unlikely. Therefore, the critical 
region of A is in the neighborhood of 0, corresponding 
to a large value of statistic t. If the measured value 
of t in an experiment is tobsi from Eq. (8) we have 
p- value 



p(iote) 



X%t;r)dt. 



(13) 



^obs 



Therefore, in terms of Eq. (9), we can calculate the 
signal significance according to the following expres- 



sion: 



N{0, l)dx = 1 - p(Us) 



X'{t;r)dt. (14) 



For the case of r = 1 , we have 

s 



N{0, l)dx = 



*obs 



X\t; l)dt = 2 



N{0, l)dx, 



and immediately obtain 

S=Vt:^, = [2{lnL^^{s + b)-lnL^^{b))Y/\ (15) 
which is identical to Eq. (7) given by Ref. [4]. 

5 Discussion and sumniciry 

In Section 2, the p-value defined by Eq. (8) is 
based on the assumption that large t values corre- 
spond to poor agreement between the null hypothesis 
Hq and the observed data, namely, the critical region 
of statistic t for Hq lies on the upper side of its distri- 
bution. If the situation is such that the critical region 
of statistic t lies on the lower side of its distribution, 
then Eq. (8) should be replaced by 



r*obs 



g{t\Ho)dt, (16) 



and the definition of statistical significance S ex- 
pressed by Eq. (9) is still applicable. For the case 
that the critical region of statistic t for Hq lies on 
both lower and upper tails of its distribution, and one 
determined from an experiment the observed t values 
in both sides: t^^,^ and t^^,^, then Eq. (8) should be 
replaced by 

p(*obs) = P{t<t^jHQ) + Pit>C.\Ho) = 



g{t\HQ)dt ■ 



git\HQ)dt. (17) 



In summary, we proposed a definition for the sta- 
tistical significance by establishing a correlation be- 
tween the normal distribution integral probability 
and the p- value observed in an experiment, which is 
suitable for both counting experiment and continu- 
ous test statistics. The explicit expressions to cal- 
culate the statistical significance for counting exper- 
iment and continuous test statistics in terms of the 
Poisson probability and likelihood-ratio are given. 
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